Response Quality Evaluation in Heterogeneous Question Answering System: A Black-box Approach
نویسندگان
چکیده
The evaluation of the question answering system is a major research area that needs much attention. Before the rise of domain-oriented question answering systems based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when question answering systems began to be more domains specific, evaluation becomes a real issue. This is especially true when understanding and reasoning is required to cater for a wider variety of questions and at the same time achieve higher quality responses The research in this paper discusses the inappropriateness of the existing measure for response quality evaluation and in a later part, the call for new standard measures and the related considerations are brought forward. As a short-term solution for evaluating response quality of heterogeneous systems, and to demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using observation, classification scheme and a scoring mechanism to assess and rank three example systems (i.e. AnswerBus, START and NaLURI). Keywords—Evaluation, question answering, response quality.
منابع مشابه
A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems
The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more doma...
متن کاملCooperation between black box and glass box approaches for the evaluation of a question answering system
For the past three years, the question answering system QALC, currently developed in our team, has been taking part in the Question Answering (QA) track of evaluation campaigns TREC (Text REtrieval Conference). In the QA track, each system is evaluated according to a black box approach: as input, a set of questions, and as output, for each question, five answers ranked with regard to decreasing...
متن کاملDesigning a Realistic Evaluation of an End-to-end Interactive Question Answering System
We report on the development of material for an evaluation exercise designed to assess the overall design and usability of HITIQA, an interactive question-answering system for preparing broad ranging reports on complex issues. The two basic objectives of the evaluation were (1) To perform a realistic assessment of the usefulness and usability of HITIQA as an end-to-end system, from the informat...
متن کاملAnswer Attenuation in Question Answering
Research in Question Answering (QA) has been dominated by the TREC methodology of black-box system evaluation. This makes it difficult to evaluate the effectiveness of individual components and requires human involvement. We have collected a set of answer locations within the AQUAINT corpus for a sample of TREC questions, in doing so we also analyse the ability of humans to retrieve answers. Ou...
متن کاملCan Automatic Post-Editing Make MT More Meaningful?
Automatic post-editors (APEs) enable the re-use of black box machine translation (MT) systems for a variety of tasks where different aspects of translation are important. In this paper, we describe APEs that target adequacy errors, a critical problem for tasks such as cross-lingual question-answering, and compare different approaches for post-editing: a rule-based system and a feedback approach...
متن کامل